fix(sdk): force_flush returns meaningful bool on MetricReader#5085
fix(sdk): force_flush returns meaningful bool on MetricReader#5085ravitheja4531-cell wants to merge 7 commits into
Conversation
|
could someone from @jd take a look when you have a moment? |
|
I personally like this pretty much. After it's merged, we'll have to open a new issue to indicate failure reason. Will look at the code with more attention soon. Btw, are you planning to add unit tests for the new behavior? |
| **kwargs, | ||
| ) -> None: | ||
| """Called by `MetricReader.collect` when it receives a batch of metrics""" | ||
| ) -> bool: |
There was a problem hiding this comment.
| ) -> bool: | |
| ) -> bool | None: |
|
|
||
| @final | ||
| def collect(self, timeout_millis: float = 10_000) -> None: | ||
| def collect(self, timeout_millis: float = 10_000) -> Optional[bool]: |
There was a problem hiding this comment.
| def collect(self, timeout_millis: float = 10_000) -> Optional[bool]: | |
| def collect(self, timeout_millis: float = 10_000) -> bool | None: |
| collect_ok = super().force_flush(timeout_millis=timeout_millis) | ||
| exporter_ok = self._exporter.force_flush(timeout_millis=timeout_millis) | ||
| return collect_ok and exporter_ok |
There was a problem hiding this comment.
- Should we test this logic?
- Should we even call the exporter's
force_flushif the general one fails?
There was a problem hiding this comment.
Addressed all review feedback:
Changed Optional[bool] → bool | None on collect and _receive_metrics (per @herin049)
Removed Optional from imports entirely
Fixed force_flush short-circuit: exporter.force_flush is now skipped if super().force_flush fails (per @Asquator)
Added 4 unit tests covering success, failure, short-circuit, and detach always running in finally
Ready for re-review. Thanks!
60da606 to
5680779
Compare
MikeGoldsmith
left a comment
There was a problem hiding this comment.
This looks good to me. Thanks @ravitheja4531-cell.
I've left a couple of small tweaks.
|
Noted changed "must" to "should" in the docstring and updated the CHANGELOG to use the PR number. Thanks! |
|
can you guys please approve it @herin049 @Asquator @MikeGoldsmith |
|
In the original code, detach(token) was called at the end of the method body but not inside a finally block. If an uncaught exception escaped the except Exception handler (e.g. a BaseException like KeyboardInterrupt), detach would never be called, leaking the context token. Moving it to finally ensures it always runs. Happy to remove the mention from the CHANGELOG if you'd prefer to keep the entry focused on the force_flush behavior. |
|
The handlng of All depends on maintainer preferences and how pedantic the library is about separating changes. Anyway, if it's left here, the merge commit message should be more generic/include both changes. |
MikeGoldsmith
left a comment
There was a problem hiding this comment.
This now looks good to me, pending @herin049's feedback.
|
Hi @MikeGoldsmith @Asquator please approve so that the checks run |
|
I already approved many times. We need a maintainer to approve the workflows. |
ravitheja4531-cell
left a comment
There was a problem hiding this comment.
Fixed — moved pmr.shutdown() inside the with patch(...) block so the mock doesn't leak into other tests. All 270 metrics tests now pass including test_cpu_time_callback and test_cpu_time_generator. Thanks @emdneto for catching this!
|
It shows that the tests are still failing |
|
@ravitheja4531-cell please check CI. It's showing failures, likely because of the linter. Please run |
Fixes open-telemetry#5020 Signed-off-by: Ravi Theja <ravitheja4531@gmail.com>
719f80c to
23c4e8e
Compare
ravitheja4531-cell
left a comment
There was a problem hiding this comment.
Updated test_detach_called_on_export_failure to use wraps=export_module.detach so the real detach function still executes while allowing call verification. All 3 tests pass locally including test_cpu_time_callback and test_cpu_time_generator. Also rebased onto latest main.
ravitheja4531-cell
left a comment
There was a problem hiding this comment.
Fixed — moved import opentelemetry.sdk.metrics._internal.export as _export_module to top level and used wraps=_export_module.detach in the patch so real detach still executes. All pre-commit checks pass locally.
|
Thanks for the PR! Just a heads-up: we no longer update Please add the appropriate changelog fragment for this change instead of editing |
ravitheja4531-cell
left a comment
There was a problem hiding this comment.
Hi @emdneto — I checked CONTRIBUTING.md on main and there's no mention of Towncrier or changelog fragments. Recent merged PRs (e.g. #5179, #5163) are all editing CHANGELOG.md directly. I've added the entry to CHANGELOG.md as per the existing convention. Please let me know if there's a different process I should follow.
|
Hi @emdneto: Can you with your review when you get a chance. |
|
@emdneto: Did you get a chance to review/any other additional steps needed. |
|
Hello, is there a chance this can be merged in the near future? |
Is I want to merge soon, please help me know if anything else needs to be done. |
|
I see the feature itself as done. Maybe there are some caveats with the .md files indeed, I don't know much about that. |
Can someone help to move faster and merge this PR please @IndiTheo @herin049 @MikeGoldsmith |
Hello Team, @MikeGoldsmith @herin049 how can we advance from here ? Need assistance to merge this asap. |
Hello @MikeGoldsmith @herin049: Any insights on this PR ? Requesting assistance to merge this sooner. |
Fixes #5020
Description
force_flushonMetricReaderandPeriodicExportingMetricReaderalwaysreturned
Trueregardless of whether the export succeeded or failed, violatingthe OTel specification:
This PR threads the actual export result up through
_receive_metrics→collect→force_flushso callers get a meaningful bool.Also fixes a pre-existing bug where
detach(token)inPeriodicExportingMetricReader._receive_metricswas only called on the happypath — moved to
finallyso it always runs.Type of change
How Has This Been Tested?
Ran the full existing metrics test suite locally: